Data

library(readxl)
avg_price <- read_excel("/Users/ali/Documents/GitHub/asm-time-series-arimax/data/UK-house-price-index.xlsx", sheet = 3)
source("validation.R")

0. Pre-processing

london_price<-avg_price[2:dim(avg_price)[1],c("...1","LONDON")] # select data
london_price$LONDON <- as.numeric(london_price$LONDON) # transform  into numerical data

1. Identification

a) Determine the needed transformations to make the series stationary. Justify the transformations carried out using graphical and numerical results.

serie <- ts(london_price$LONDON, start = c(1995, 1), freq = 12)
plot(serie,main="London housing prices")
abline(v=1995:2024,col=4,lty=3)

We shall diagnose if the series is stationary. And if it is not, try to transform the series into a stationary one.

Is the variance constant?

To diagnose the non-constant variance, we will check 2 plots:

  • Boxplot
boxplot(serie~floor(time(serie)), xlab='time (years)')

  • Mean/Var plot
mserie <- matrix(serie, ncol = 12, byrow = TRUE) 
m <-apply(mserie, 2, mean) # mean of each column (year)
v <-apply(mserie, 2, var) 
plot(v ~ m, xlab='mean', ylab='variance')
abline(lm(v ~ m), col=2, lty=3)
text(m,v,1995:2024)

The first plot we can observe that the variance is not consistent for all years. There are bigger and smaller boxes during the years. In the second plot, we can see that for higher values of the mean we have higher values of the variance. Therefore we shall consider transforming the scale. We can check by the BoxCox method if the logarithm would be a good option.

library(forecast)
BoxCox.lambda(serie+1)
## [1] 0.5087956

We have that lambda is close to 0, therefore a logarithmic transformation seems reasonable.

lnserie=log(serie)
plot(lnserie)

And check the same plots whether it improved or not.

#boxplot
boxplot(lnserie~floor(time(lnserie)), xlab='time (years)')

#mean/var plot
mserie <- matrix(lnserie, ncol = 12, byrow = TRUE) 
m <-apply(mserie, 2, mean) # mean of each column (year)
v <-apply(mserie, 2, var) 
plot(v ~ m, xlab='mean', ylab='variance')
abline(lm(v ~ m), col=2, lty=3)
text(m,v,1995:2024)

# comparison boxplot between series
par(mfrow=c(1,2))
boxplot(serie~floor(time(lnserie)), xlab='time (years)') 
boxplot(lnserie~floor(time(lnserie)), xlab='time (years)')

The boxes seem to became smaller. #### Is there a Seasonal Pattern?

monthplot(lnserie)

There is no variations across the months, therefore seems to not have a seasonal pattern.

ts.plot(matrix(lnserie,nrow=12),col=1:8)

Does not seem to have seasonal monthly patterns. We will plot the decomposition of additive time series to see more insights.

decomposed <- decompose(lnserie,type = "additive")
plot(decomposed)

We can see that there is a seasonal patter, later on we will explore more.

Is the mean constant?

There is kind of an almost-linear trend. We will apply one regular difference trying to make the mean constant:

d1lnserie=diff(lnserie)
plot(d1lnserie)
#abline(h=mean(lnserie))
abline(h=mean(d1lnserie),col=2)

The mean from the lnserie is not constant.

mean(lnserie)
## [1] 12.4715

And the mean from d1lnserie is close to 0:

mean(d1lnserie)
## [1] 0.005490386

We shall check for over-differentiation: Take an extra differentiation and compare the variances.

d1d1lnserie=diff(d1lnserie)
plot(d1d1lnserie)
abline(h=0)
abline(h=mean(d1d1lnserie),col=2)

var(lnserie)
## [1] 0.357134
var(d1lnserie)
## [1] 0.0001493162
var(d1d1lnserie)
## [1] 0.0002410005
mean(lnserie)
## [1] 12.4715
mean(d1lnserie)
## [1] 0.005490386
mean(d1d1lnserie)
## [1] 4.04042e-05

Although the mean is closer to 0, the variance increase when we take an extra regular difference. Therefore we do not need an extra differentiation.

Final transformation for the London housing series: \(W_t=(1-B)log(serie)\)

b) Analyze the ACF and PACFof the stationary series to identify at least two plausible models. Reason about what features of the correlograms you use to identify these models.

The d1lnserie ACF is the following:

acf(d1lnserie,ylim=c(-1,1),lag.max=12*5,lwd=2,col=c(2,rep(1,11)), main ="ACF(d1d1lnserie)")

acf(d1lnserie,ylim=c(-1,1),lag.max=12*25,lwd=2,col=c(2,rep(1,11)), main ="ACF(d1lnserie)")

The ACF plot shows a clear spike at lag 12, but more precisely if you see a recurring cycle every 12 months, we likely have a yearly seasonality in the data.

The ACF appears to tail off slowly, rather than cutting off abruptly at one lag, which might indicate that the series has autoregressive (AR) behavior. We can see decreasing patterns (infinity lags not null). In this context, makes sense that the price of the house deppends on the past price.

And the PACF is the following:

pacf(d1lnserie,ylim=c(-1,1),col=c(rep(1,11),2),lwd=2,lag.max=12*5,main ="PACF(d1lnserie)")

pacf(d1lnserie,ylim=c(-1,1),col=c(rep(1,11),2),lwd=2,lag.max=12*25,main ="PACF(d1lnserie)")

MODELS - ARIMA(1,1,0)(1,0,1)[12] - ARIMA(1,1,1)(1,0,1)[12] -

2. Estimation

a) Use R to estimate the identified models.

With the plot we observed some characteristics of the time series, but we will check which is the best model numerically by trying to optimize the metrics: AIC (minimize), loglike (maximize) and BIC (minimize).

Manually we can see that :

model1 <- arima(lnserie, order = c(1, 1, 0), seasonal = list(order = c(1, 0, 1), period = 12))
summary(model1)
## 
## Call:
## arima(x = lnserie, order = c(1, 1, 0), seasonal = list(order = c(1, 0, 1), period = 12))
## 
## Coefficients:
##          ar1    sar1     sma1
##       0.1715  0.9571  -0.7920
## s.e.  0.0535  0.0299   0.0754
## 
## sigma^2 estimated as 0.0001214:  log likelihood = 1096,  aic = -2184
## 
## Training set error measures:
##                        ME       RMSE        MAE         MPE       MAPE     MASE
## Training set 0.0008447912 0.01101892 0.00846293 0.007610119 0.06809117 0.803744
##                     ACF1
## Training set -0.06665244
model2 <- arima(lnserie, order = c(1, 1, 1), seasonal = list(order = c(1, 0, 1), period = 12))
summary(model2)
## 
## Call:
## arima(x = lnserie, order = c(1, 1, 1), seasonal = list(order = c(1, 0, 1), period = 12))
## 
## Coefficients:
##          ar1      ma1    sar1     sma1
##       0.9119  -0.7297  0.9714  -0.8490
## s.e.  0.0317   0.0465  0.0237   0.0659
## 
## sigma^2 estimated as 0.0001043:  log likelihood = 1122.49,  aic = -2234.99
## 
## Training set error measures:
##                        ME       RMSE         MAE         MPE       MAPE
## Training set 0.0003577645 0.01021758 0.007783959 0.003337475 0.06256963
##                   MASE       ACF1
## Training set 0.7392606 -0.1968962
model3 <- arima(lnserie, order = c(2,2, 1), seasonal = list(order = c(1, 0, 1), period = 12))
summary(model3)
## 
## Call:
## arima(x = lnserie, order = c(2, 2, 1), seasonal = list(order = c(1, 0, 1), period = 12))
## 
## Coefficients:
##           ar1      ar2      ma1    sar1     sma1
##       -0.5990  -0.2677  -0.4011  0.9880  -0.8849
## s.e.   0.0972   0.0796   0.0952  0.0105   0.0480
## 
## sigma^2 estimated as 9.665e-05:  log likelihood = 1129.93,  aic = -2247.86
## 
## Training set error measures:
##                         ME        RMSE        MAE           MPE       MAPE
## Training set -7.671904e-05 0.009839481 0.00759426 -0.0005212171 0.06098044
##                   MASE         ACF1
## Training set 0.7212444 0.0005329013
Arima(lnserie, order = c(2,2, 1), seasonal = list(order = c(1, 0, 1), period = 12))
## Series: lnserie 
## ARIMA(2,2,1)(1,0,1)[12] 
## 
## Coefficients:
##           ar1      ar2      ma1    sar1     sma1
##       -0.5990  -0.2677  -0.4011  0.9880  -0.8849
## s.e.   0.0972   0.0796   0.0952  0.0105   0.0480
## 
## sigma^2 = 9.875e-05:  log likelihood = 1129.93
## AIC=-2247.86   AICc=-2247.62   BIC=-2224.63

3. Validation

validation(model1)

## 
## --------------------------------------------------------------------
## 
## Call:
## arima(x = lnserie, order = c(1, 1, 0), seasonal = list(order = c(1, 0, 1), period = 12))
## 
## Coefficients:
##          ar1    sar1     sma1
##       0.1715  0.9571  -0.7920
## s.e.  0.0535  0.0299   0.0754
## 
## sigma^2 estimated as 0.0001214:  log likelihood = 1096,  aic = -2184
## 
## Modul of AR Characteristic polynomial Roots:  1.003659 1.003659 1.003659 1.003659 1.003659 1.003659 1.003659 1.003659 1.003659 1.003659 1.003659 1.003659 5.829318 
## 
## Modul of MA Characteristic polynomial Roots:  1.019624 1.019624 1.019624 1.019624 1.019624 1.019624 1.019624 1.019624 1.019624 1.019624 1.019624 1.019624

## 
## Psi-weights (MA(inf))
## 
## --------------------
##        psi 1        psi 2        psi 3        psi 4        psi 5        psi 6 
## 1.715467e-01 2.942826e-02 5.048319e-03 8.660222e-04 1.485632e-04 2.548552e-05 
##        psi 7        psi 8        psi 9       psi 10       psi 11       psi 12 
## 4.371956e-06 7.499945e-07 1.286591e-07 2.207103e-08 3.786211e-09 1.651281e-01 
##       psi 13       psi 14       psi 15       psi 16       psi 17       psi 18 
## 2.832717e-02 4.859432e-03 8.336193e-04 1.430046e-04 2.453196e-05 4.208376e-06 
##       psi 19       psi 20       psi 21       psi 22       psi 23       psi 24 
## 7.219329e-07 1.238452e-07 2.124522e-08 3.644547e-09 6.252099e-10 1.580470e-01 
## 
## Pi-weights (AR(inf))
## 
## --------------------
##        pi 1        pi 2        pi 3        pi 4        pi 5        pi 6 
##  0.17154666  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000 
##        pi 7        pi 8        pi 9       pi 10       pi 11       pi 12 
##  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000  0.16512810 
##       pi 13       pi 14       pi 15       pi 16       pi 17       pi 18 
## -0.02832717  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000 
##       pi 19       pi 20       pi 21       pi 22       pi 23       pi 24 
##  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000  0.13077967 
## 
## Descriptive Statistics for the Residuals
## 
## ----------------------------------------
##                   resi
## nobs        357.000000
## NAs           0.000000
## Minimum      -0.041723
## Maximum       0.034892
## 1. Quartile  -0.005863
## 3. Quartile   0.008048
## Mean          0.000845
## Median        0.000340
## Sum           0.301590
## SE Mean       0.000582
## LCL Mean     -0.000300
## UCL Mean      0.001990
## Variance      0.000121
## Stdev         0.011002
## Skewness     -0.209288
## Kurtosis      0.901107
## 
## Normality Tests
## 
## --------------------
## 
##  Shapiro-Wilk normality test
## 
## data:  resi
## W = 0.98634, p-value = 0.001902
## 
## 
##  Anderson-Darling normality test
## 
## data:  resi
## A = 1.0426, p-value = 0.009554
## 
## 
##  Jarque Bera Test
## 
## data:  resi
## X-squared = 15.302, df = 2, p-value = 0.0004755
## 
## 
## Homoscedasticity Test
## 
## --------------------
## 
##  studentized Breusch-Pagan test
## 
## data:  resi ~ I(obs - resi)
## BP = 0.75303, df = 1, p-value = 0.3855
## 
## 
## Independence Tests
## 
## --------------------
## 
##  Durbin-Watson test
## 
## data:  resi ~ I(1:length(resi))
## DW = 2.1873, p-value = 0.9574
## alternative hypothesis: true autocorrelation is greater than 0
## 
## 
## Ljung-Box test
##      lag.df  statistic      p.value
## [1,]      1   1.599354 2.059947e-01
## [2,]      2  32.227461 1.004374e-07
## [3,]      3  63.791286 9.092727e-14
## [4,]      4  71.361795 1.165734e-14
## [5,]     12 107.585909 0.000000e+00
## [6,]     24 126.698875 6.661338e-16
## [7,]     36 148.578670 1.221245e-15
## [8,]     48 157.537675 1.385558e-13
validation(model2)

## 
## --------------------------------------------------------------------
## 
## Call:
## arima(x = lnserie, order = c(1, 1, 1), seasonal = list(order = c(1, 0, 1), period = 12))
## 
## Coefficients:
##          ar1      ma1    sar1     sma1
##       0.9119  -0.7297  0.9714  -0.8490
## s.e.  0.0317   0.0465  0.0237   0.0659
## 
## sigma^2 estimated as 0.0001043:  log likelihood = 1122.49,  aic = -2234.99
## 
## Modul of AR Characteristic polynomial Roots:  1.00242 1.00242 1.00242 1.00242 1.00242 1.00242 1.00242 1.00242 1.00242 1.00242 1.00242 1.00242 1.09656 
## 
## Modul of MA Characteristic polynomial Roots:  1.013737 1.013737 1.013737 1.013737 1.013737 1.013737 1.013737 1.013737 1.013737 1.013737 1.013737 1.013737 1.370444

## 
## Psi-weights (MA(inf))
## 
## --------------------
##      psi 1      psi 2      psi 3      psi 4      psi 5      psi 6      psi 7 
## 0.18225199 0.16620332 0.15156786 0.13822116 0.12604974 0.11495010 0.10482787 
##      psi 8      psi 9     psi 10     psi 11     psi 12     psi 13     psi 14 
## 0.09559698 0.08717894 0.07950218 0.07250141 0.18855315 0.08260921 0.07533484 
##     psi 15     psi 16     psi 17     psi 18     psi 19     psi 20     psi 21 
## 0.06870104 0.06265139 0.05713446 0.05210334 0.04751524 0.04333116 0.03951553 
##     psi 22     psi 23     psi 24 
## 0.03603588 0.03286265 0.14890470 
## 
## Pi-weights (AR(inf))
## 
## --------------------
##          pi 1          pi 2          pi 3          pi 4          pi 5 
##  0.1822519882  0.1329875311  0.0970397284  0.0708089609  0.0516686209 
##          pi 6          pi 7          pi 8          pi 9         pi 10 
##  0.0377020981  0.0275108601  0.0200744112  0.0146481057  0.0106885825 
##         pi 11         pi 12         pi 13         pi 14         pi 15 
##  0.0077993563  0.1281271535 -0.0181614587 -0.0132522426 -0.0096700345 
##         pi 16         pi 17         pi 18         pi 19         pi 20 
## -0.0070561316 -0.0051487917 -0.0037570240 -0.0027414645 -0.0020004204 
##         pi 21         pi 22         pi 23         pi 24 
## -0.0014596876 -0.0010651201 -0.0007772079  0.1033781501 
## 
## Descriptive Statistics for the Residuals
## 
## ----------------------------------------
##                   resi
## nobs        357.000000
## NAs           0.000000
## Minimum      -0.041512
## Maximum       0.033781
## 1. Quartile  -0.005891
## 3. Quartile   0.006263
## Mean          0.000358
## Median       -0.000314
## Sum           0.127722
## SE Mean       0.000541
## LCL Mean     -0.000707
## UCL Mean      0.001422
## Variance      0.000105
## Stdev         0.010226
## Skewness     -0.117473
## Kurtosis      1.033953
## 
## Normality Tests
## 
## --------------------
## 
##  Shapiro-Wilk normality test
## 
## data:  resi
## W = 0.98796, p-value = 0.004707
## 
## 
##  Anderson-Darling normality test
## 
## data:  resi
## A = 1.006, p-value = 0.01176
## 
## 
##  Jarque Bera Test
## 
## data:  resi
## X-squared = 17.436, df = 2, p-value = 0.0001636
## 
## 
## Homoscedasticity Test
## 
## --------------------
## 
##  studentized Breusch-Pagan test
## 
## data:  resi ~ I(obs - resi)
## BP = 1.0598, df = 1, p-value = 0.3033
## 
## 
## Independence Tests
## 
## --------------------
## 
##  Durbin-Watson test
## 
## data:  resi ~ I(1:length(resi))
## DW = 2.4081, p-value = 0.9999
## alternative hypothesis: true autocorrelation is greater than 0
## 
## 
## Ljung-Box test
##      lag.df statistic      p.value
## [1,]      1  13.95685 1.870550e-04
## [2,]      2  19.27533 6.522534e-05
## [3,]      3  31.29834 7.355588e-07
## [4,]      4  31.30517 2.652466e-06
## [5,]     12  51.06593 9.068729e-07
## [6,]     24  78.28989 1.136670e-07
## [7,]     36  90.10894 1.559193e-06
## [8,]     48  96.91819 3.730502e-05
validation(model3)

## 
## --------------------------------------------------------------------
## 
## Call:
## arima(x = lnserie, order = c(2, 2, 1), seasonal = list(order = c(1, 0, 1), period = 12))
## 
## Coefficients:
##           ar1      ar2      ma1    sar1     sma1
##       -0.5990  -0.2677  -0.4011  0.9880  -0.8849
## s.e.   0.0972   0.0796   0.0952  0.0105   0.0480
## 
## sigma^2 estimated as 9.665e-05:  log likelihood = 1129.93,  aic = -2247.86
## 
## Modul of AR Characteristic polynomial Roots:  1.001006 1.001006 1.001006 1.001006 1.001006 1.001006 1.001006 1.001006 1.001006 1.001006 1.001006 1.001006 1.932893 1.932893 
## 
## Modul of MA Characteristic polynomial Roots:  1.010244 1.010244 1.010244 1.010244 1.010244 1.010244 1.010244 1.010244 1.010244 1.010244 1.010244 1.010244 2.493415

## 
## Psi-weights (MA(inf))
## 
## --------------------
##         psi 1         psi 2         psi 3         psi 4         psi 5 
## -1.0000498046  0.3313627955  0.0691896329 -0.1301368397  0.0594317800 
##         psi 6         psi 7         psi 8         psi 9        psi 10 
## -0.0007667614 -0.0154482506  0.0094586316 -0.0015307725 -0.0016147788 
##        psi 11        psi 12        psi 13        psi 14        psi 15 
##  0.0013769690  0.1027261388 -0.1032572631  0.0343546959  0.0070596473 
##        psi 16        psi 17        psi 18        psi 19        psi 20 
## -0.0134240746  0.0061513437 -0.0000915207 -0.0015916510  0.0009778849 
##        psi 21        psi 22        psi 23        psi 24 
## -0.0001597246 -0.0001660671  0.0001422251  0.1018409187 
## 
## Pi-weights (AR(inf))
## 
## --------------------
##          pi 1          pi 2          pi 3          pi 4          pi 5 
## -1.000050e+00 -6.687368e-01 -2.682012e-01 -1.075638e-01 -4.313916e-02 
##          pi 6          pi 7          pi 8          pi 9         pi 10 
## -1.730124e-02 -6.938771e-03 -2.782839e-03 -1.116075e-03 -4.476092e-04 
##         pi 11         pi 12         pi 13         pi 14         pi 15 
## -1.795165e-04  1.030467e-01  1.030950e-01  6.894771e-02  2.765192e-02 
##         pi 16         pi 17         pi 18         pi 19         pi 20 
##  1.108998e-02  4.447708e-03  1.783782e-03  7.153971e-04  2.869146e-04 
##         pi 21         pi 22         pi 23         pi 24 
##  1.150689e-04  4.614913e-05  1.850841e-05  9.125561e-02 
## 
## Descriptive Statistics for the Residuals
## 
## ----------------------------------------
##                   resi
## nobs        357.000000
## NAs           0.000000
## Minimum      -0.038214
## Maximum       0.027894
## 1. Quartile  -0.006283
## 3. Quartile   0.005414
## Mean         -0.000077
## Median       -0.000337
## Sum          -0.027389
## SE Mean       0.000521
## LCL Mean     -0.001102
## UCL Mean      0.000949
## Variance      0.000097
## Stdev         0.009853
## Skewness      0.095842
## Kurtosis      0.677857
## 
## Normality Tests
## 
## --------------------
## 
##  Shapiro-Wilk normality test
## 
## data:  resi
## W = 0.98943, p-value = 0.01108
## 
## 
##  Anderson-Darling normality test
## 
## data:  resi
## A = 1.0988, p-value = 0.006943
## 
## 
##  Jarque Bera Test
## 
## data:  resi
## X-squared = 7.8097, df = 2, p-value = 0.02014
## 
## 
## Homoscedasticity Test
## 
## --------------------
## 
##  studentized Breusch-Pagan test
## 
## data:  resi ~ I(obs - resi)
## BP = 0.65351, df = 1, p-value = 0.4189
## 
## 
## Independence Tests
## 
## --------------------
## 
##  Durbin-Watson test
## 
## data:  resi ~ I(1:length(resi))
## DW = 1.995, p-value = 0.4599
## alternative hypothesis: true autocorrelation is greater than 0
## 
## 
## Ljung-Box test
##      lag.df    statistic    p.value
## [1,]      1 1.022366e-04 0.99193256
## [2,]      2 9.446917e-03 0.99528768
## [3,]      3 3.271435e-02 0.99844164
## [4,]      4 4.883515e-02 0.99970670
## [5,]     12 1.753563e+01 0.13053495
## [6,]     24 3.390122e+01 0.08643994
## [7,]     36 4.046930e+01 0.27952235
## [8,]     48 4.553930e+01 0.57423503

a) Perform the complete analysis of residuals, justifying all assumptions made. Use the corresponding tests and graphical results.

b) Include analysis of the expressions of the AR and MA infinite models, discuss if they are causal and/or invertible and report some adequacy measures.

Model 1.

Expressions analysis: Causality and inveritbility: The model is both causal and invertible. This can be confirmed analytically by observing that the modulus of the AR characteristic polynomial roots and the modulus of the MA characteristic polynomial roots are greater than 1. Additionally, the model is graphically confirmed as causal and invertible, since the inverse roots of both the AR and MA polynomials lie inside the unit circle.

Model 2.

Causality and inveritbility: The model is both causal and invertible. Same reason as for model 1.

Model 3:

Causality and inveritbility: The model is both causal and invertible. Same reason as for model 1.

c) Check the stability of the proposed models and evaluate their capability of prediction, reserving the last 12 observations.

We have seen that the model with the best AIC, loglike and BIC is … Model. But to compare we will use RME, RSM forcasting error

## split train
train_data <- lnserie[1:(length(lnserie) - 12)]
test_data <- lnserie[(length(lnserie) - 11):length(lnserie)]
## model 1
model1 <- arima(train_data, order = c(1, 1, 0), seasonal = list(order = c(1, 0, 1), period = 12))
model2 <- arima(train_data, order = c(1, 1, 1), seasonal = list(order = c(1, 0, 1), period = 12))
model3 <- arima(train_data, order = c(2,2, 1), seasonal = list(order = c(1, 0, 1), period = 12))

forecast_values1 <- forecast(model1, h = 12)
forecast_values2 <- forecast(model2, h = 12)
forecast_values3 <- forecast(model3, h = 12)

accuracy(forecast_values1, test_data)
##                        ME       RMSE         MAE         MPE       MAPE
## Training set  0.001059155 0.01110578 0.008561819  0.00930855 0.06898541
## Test set     -0.028730149 0.02996202 0.028730149 -0.21846636 0.21846636
##                   MASE       ACF1
## Training set 0.8083084 -0.0720078
## Test set     2.7123698         NA
accuracy(forecast_values2, test_data)
##                         ME       RMSE         MAE          MPE       MAPE
## Training set  0.0004948748 0.01029705 0.007865779  0.004423214 0.06332901
## Test set     -0.0259639947 0.02722785 0.025963995 -0.197437223 0.19743722
##                   MASE       ACF1
## Training set 0.7425962 -0.2045885
## Test set     2.4512213         NA
accuracy(forecast_values3, test_data)
##                         ME        RMSE         MAE           MPE      MAPE
## Training set -3.448897e-05 0.009891862 0.007631939 -0.0001886737 0.0614081
## Test set     -2.882083e-02 0.029704729 0.028820833 -0.2191253091 0.2191253
##                   MASE         ACF1
## Training set 0.7205198 0.0004217666
## Test set     2.7209311           NA

d) Select the best model for forecasting.

model2 <- arima(lnserie, order = c(1, 1, 1), seasonal = list(order = c(1, 0, 1), period = 12))
best_model<-model2

4. Predictions

a) Obtain long term forecasts for the twelve months following the last observation available; provide also confidence intervals.

forecast_results <- forecast(best_model, h = 12)
plot(forecast_results)

print(forecast_results)
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Oct 2024       13.16481 13.15172 13.17790 13.14479 13.18483
## Nov 2024       13.16318 13.14291 13.18345 13.13218 13.19418
## Dec 2024       13.16513 13.13825 13.19201 13.12402 13.20624
## Jan 2025       13.16622 13.13293 13.19950 13.11531 13.21713
## Feb 2025       13.16197 13.12238 13.20157 13.10141 13.22253
## Mar 2025       13.16143 13.11559 13.20727 13.09132 13.23154
## Apr 2025       13.16531 13.11329 13.21734 13.08575 13.24488
## May 2025       13.16673 13.10858 13.22487 13.07780 13.25566
## Jun 2025       13.17611 13.11191 13.24032 13.07792 13.27430
## Jul 2025       13.18356 13.11337 13.25375 13.07621 13.29091
## Aug 2025       13.18717 13.11106 13.26327 13.07078 13.30356
## Sep 2025       13.18680 13.10486 13.26873 13.06149 13.31210
# Subset the last 5 years (80% window) of historical data
historical_data <- window(lnserie, start = time(lnserie)[length(time(lnserie)) - (5 * 12) + 1])

# Apply the exponential transformation to revert to the original scale
forecast_df <- data.frame(
  Time = c(time(historical_data), time(forecast_results$mean)),
  Value = c(exp(as.numeric(historical_data)), exp(as.numeric(forecast_results$mean))),
  Type = c(rep("Historical", length(historical_data)), rep("Forecast", length(forecast_results$mean))),
  Lower80 = c(rep(NA, length(historical_data)), exp(as.numeric(forecast_results$lower[, 1]))),  # 80% CI lower
  Upper80 = c(rep(NA, length(historical_data)), exp(as.numeric(forecast_results$upper[, 1]))),  # 80% CI upper
  Lower95 = c(rep(NA, length(historical_data)), exp(as.numeric(forecast_results$lower[, 2]))),  # 95% CI lower
  Upper95 = c(rep(NA, length(historical_data)), exp(as.numeric(forecast_results$upper[, 2])))   # 95% CI upper
)

library(ggplot2)

ggplot(forecast_df, aes(x = Time, y = Value, color = Type)) +
  geom_line(size = 1) +  # Line for historical and forecast data
  # 80% confidence interval ribbon
  geom_ribbon(data = subset(forecast_df, Type == "Forecast"), 
              aes(ymin = Lower80, ymax = Upper80), fill = "lightgreen", alpha = 0.3) +
  # 95% confidence interval ribbon
  geom_ribbon(data = subset(forecast_df, Type == "Forecast"), 
              aes(ymin = Lower95, ymax = Upper95), fill = "lightblue", alpha = 0.4) +
  labs(title = "Time Series with 1-Year Forecast and Confidence Intervals",
       x = "Time",
       y = "Housing Price (Original Scale)",
       color = "Data Type") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_x_continuous(breaks = seq(floor(min(forecast_df$Time)), ceiling(max(forecast_df$Time)), by = 1)) +
  scale_color_manual(values = c("Historical" = "black", "Forecast" = "blue"))